In this paper we present an extensive evaluation of visual descriptors forthe content-based retrieval of remote sensing (RS) images. The evaluationincludes global hand-crafted, local hand-crafted, and Convolutional NeuralNetwork (CNNs) features coupled with four different Content-Based ImageRetrieval schemes. We conducted all the experiments on two publicly availabledatasets: the 21-class UC Merced Land Use/Land Cover (LandUse) dataset and19-class High-resolution Satellite Scene dataset (SceneSat). The content of RSimages might be quite heterogeneous, ranging from images containing finegrained textures, to coarse grained ones or to images containing objects. It istherefore not obvious in this domain, which descriptor should be employed todescribe images having such a variability. Results demonstrate that CNN-basedfeatures perform better than both global and and local hand-crafted featureswhatever is the retrieval scheme adopted. Features extracted from SatResNet-50,a residual CNN suitable fine-tuned on the RS domain, shows much betterperformance than a residual CNN pre-trained on multimedia scene and objectimages. Features extracted from NetVLAD, a CNN that considers both CNN andlocal features, works better than others CNN solutions on those images thatcontain fine-grained textures and objects.
展开▼